Reproducible Publications w/ Python and Quarto


Tom Mock, 

 thomasmock.quarto.pub/python

2022-12-03

What is Quarto?

https://quarto.org

Quarto is an open-source scientific and technical publishing system that builds on standard markdown with features essential for scientific communication.

  • Computations: Python, R, Julia, Observable JS
  • Markdown: Pandoc flavored markdown with many enhancements
  • Output: Documents, presentations, websites, books, blogs

Literate programming system in the tradition of Org-Mode, Weave.jl, R Markdown, iPyPublish, Jupyter Book, etc.

Origins

  • Open Source project sponsored by Posit, PBC (formerly known as RStudio, PBC)
  • 10 years of experience with R Markdown, a similar system that was R-specific, convinced us that the core ideas were sound
  • The number of languages and runtimes used for scientific discourse is broad
  • Quarto is a ground-up re-imagining of R Markdown that is fundamentally multi-language and multi-engine
  • Quarto gets inspiration from both R Markdown and Jupyter, and provides a plain-text option or the use of native Jupyter notebooks

Goal: Computation document

  • Documents that include source code for their production
  • Notebook AND plain-text flavors
  • Programmatic automation and reproducibility

Goal: Scientific Markdown

Goal: Single Source Publishing

Simple Example

---
title: "matplotlib demo"
format:
  html:
    code-fold: true
jupyter: python3
---

For a demonstration of a line plot on a polar 
axis, see @fig-polar.
```{python}
#| label: fig-polar
#| fig-cap: "A line plot on a polar axis"

import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(
  subplot_kw = {'projection': 'polar'} 
)
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()
```

Simple Example, multi-format


Can be rendered to dozens of output formats with Quarto (via Pandoc):

quarto render hello.qmd --to html
quarto render hello.qmd --to pdf
quarto render hello.qmd --to docx
quarto render hello.qmd --to epub
quarto render hello.qmd --to pptx
quarto render hello.qmd --to revealjs

Feature R Markdown Quarto
Cross References
Websites & Blogs
Books
Interactivity Shiny Documents Quarto Interactive Documents
Paged HTML pagedown Coming soon!
Journal Articles rticles Out and more coming!
Dashboards flexdashboard Coming soon!

So what is Quarto?

Quarto® is an open-source scientific and technical publishing system built on Pandoc.

  • quarto is a language agnostic command line interface (CLI)
thomasmock$ quarto --help
Usage:   quarto
Version: 1.2.269

Commands:
  render  [input] [args...] - Render input file(s) to various document types.            
  preview [file] [args...]  - Render and preview a document or website project.          
  publish [provider] [path] - Publish a document or project.

A .qmd is a plain text file

Metadata (YAML)

format: html
jupyter: python3
format: html
engine: knitr

Code

```{python}
from siuba import *
(mtcars
  >> group_by(_.cyl)
  >> summarize(avg_mpg = _.mpg.mean()))
```
```{r}
library(dplyr)
mtcars |> 
  group_by(cyl) |> 
  summarize(mean = mean(mpg))
```

Text

# Heading 1
This is a sentence with some **bold text**, some *italic text* and an 
![image](image.png){fig-alt="Alt text for this image"}.

But Quarto doesn’t have to be plain-text

Rendering pipeline

Plain text workflow (.qmd uses Jupyter kernel to execute cells):

Notebook workflow (defaults to using existing stored computation):

What to do with my existing .ipynb?

You can keep using them! You get to choose whether to use the stored computation OR re-execute the document from top to bottom.


# --execute command optional - ignore stored computation
quarto render my-favorite.ipynb --to html --execute


Quarto can help convert back and forth between plain text .qmd and .ipynb:

quarto convert --help

Usage:   quarto convert <input>

Description:
    Convert documents to alternate representations.

Convert notebook to markdown:                quarto convert doc.ipynb                
Convert markdown to notebook:                quarto convert doc.qmd                  
Convert notebook to markdown, write to file: quarto convert doc.ipynb --output doc.qmd

nbdev + Quarto = super powers

A tweet by Jeremy Howard, FYI nbdev will be moving to Quarto and Fastdoc probably too

A tweet by Hamel Husain, 'I'm going to be announcing an epic new version of nbdev in tihs talk! The next version of nbdev is going to be built on top of Quarto'

fast.ai - nbdev+Quarto: A new secret weapon for productivity

Comfort of your own workspace

A screenshot of a Quarto document rendered inside JupyterLab

A screenshot of a Quarto document rendered inside VSCode

A screenshot of a Quarto document rendered inside RStudio

Auto-completion in RStudio + VSCode


Both RStudio and VSCode with the Quarto extension have rich auto-completion

YAML

A gif of auto-completion and search for YAML options inside RStudio

Chunk option

A gif of auto-completion of a R chunk inside RStudio

Interactivity with Jupyter Widgets

import plotly.express as px
import plotly.io as pio
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", 
                 color="species", 
                 marginal_y="violin", marginal_x="box", 
                 trendline="ols", template="simple_white")
fig.show()

Interactivity, Observable

Quarto also includes native support for Observable JS, a set of enhancements to vanilla JavaScript created by Mike Bostock (also the author of D3)

Quarto, also with Observable Javascript!


Converting temperature from ℃ to ℉

Celsius = and Fahrenheit = ℉.

```{ojs}
viewof temp = Inputs.range([0, 100], {step: 1, value: 34, label: htl.html`Temp &#x2103;`})
```

Converting temperature from &#x2103; to &#x2109; <br>  
Celsius = ${d3.format(".0f")(temp)}&#x2103; and Fahrenheit = ${d3.format(".1f")(temp * 9/5 + 32)}&#x2109;.

Quarto, unified document layout

quarto render boston-terrier.qmd --to html
quarto render boston-terrier.qmd --to pdf

A screenshot of a HTML article about Boston Terriers, the document has an image in the right hard margin, a floating table of contents, and different sections split up by headers

HTML

A screenshot of a PDF article about Boston Terriers, the document has an image in the right hard margin, a floating table of contents, and different sections split up by headers

PDF

Quarto, unified syntax

Say hello to “fenced divs”:

::: {.class}
... content ...
:::

And to bracketed spans

And to [bracketed spans]{.fragment}

Quarto uses these to apply common syntax across formats (PDF, HTML, docx, pptx, etc)!

Quarto, unified syntax

This syntax also applies across code-chunks

::: {layout-ncol=2}
![](image1.png)

![](image2.png)
:::
```{python}
#| layout-ncol: 2

from plotnine import ggplot, geom_point, geom_boxplot, aes, stat_smooth, facet_wrap, theme
from plotnine.data import mtcars

# plot 1 in column 1
plot1 = (ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
   + geom_point() + stat_smooth(method='lm')
   + facet_wrap('~gear')).draw(show=True)

# plot 2 in column 2
plot2 = (ggplot(mtcars, aes('cyl', 'mpg', color='factor(cyl)'))
+ geom_boxplot()
 ).draw(show=True)
```

Built-in vs custom

One goal of Quarto is to provide a markdown-centric format-agnostic syntax as shown in previous slides.

  • Quarto bundles Bootstrap CSS and themes, and respects SASS variables for robust styling of HTML content (HTML documents, websites, books, slides, etc).
  • Quarto includes LaTeX templates for specific journals as well as good defaults for PDF outputs in general.
  • You shouldn’t HAVE to escape out to writing raw LaTeX, HTML, Jinja templates, etc
  • In vast majority of situations, can rely purely on Markdown syntax
  • BUT you can always include raw content such as LaTeX, CSS, HTML, JavaScript to further customize and optimize for a specific format.

Quarto Projects + Freeze

Quarto projects are directories that provide:

  • A way to render all or some of the files in a directory with a single command (e.g. quarto render myproject).

  • A way to share YAML configuration across multiple documents.

  • The ability to redirect output artifacts to another directory.

  • The ability to freeze rendered output (i.e. don’t re-execute documents unless they have changed).

Quarto Projects

Controlled via a _quarto.yml config file.

A _quarto.yml file might look like:

# define as project, set output directory
project:
  output-dir: _output

# add global YAML options
toc: true
number-sections: true
bibliography: references.bib  
  
# define the multi-formats that will be rendered
format:
  # HTML w/ specific css
  html:
    css: styles.css
    html-math-method: katex
  # PDF with specific sizing
  pdf:
    documentclass: report
    margin-left: 30mm
    margin-right: 30mm

Stored/frozen computation

Jupyter natively approaches this as storing the source code, output file, and computation in a single document (.ipynb which is JSON).


Quarto also provides a method but approaches differently:

  • Source code input (plain text .qmd or .ipynb)
  • Output file (some format like .html or .pdf)
  • Frozen computation via freeze: true, stored by directory and file as .json and other intermediary files

Quarto Projects

  • A project may have multiple contributors or even languages
  • Quarto projects can render different documents in their respective languages (R, Python, Julia, JavaScript or raw markdown).


├── _quarto.yml #<- define a Quarto project
├── .venv #<- use a python virtual env 
├── exploratory
│   ├── eda.ipynb
│   ├── dplyr-summary.qmd
│   └── plotnine-plots.qmd
├── hyperparameter-tuning
│   ├── grid-define.qmd
│   ├── torch-train.ipynb
│   ├── vetiver-pin.qmd
│   └── vetiver-deploy.py
├── plots
│   ├── density-plot.png
│   ├── roc-curve.png
│   └── accuracy-vs-grid.png
├── _freeze #<- frozen output stored for specific content
│   └── hyperparemeter-tuning
│      └── execute-results
│          └── out.json

Quarto Projects

Freeze is generally used when you have either a large number of collaborators or many computational documents created over a longer period of time.

execute:
  freeze: true  # never re-render during project render
                # unless explicitly call render on THAT file

execute:
  freeze: auto  # re-render only when source changes

Extending Quarto with extensions

Shortcodes

  • Replace inline “short codes” with output.
{{< fa thumbs-up >}} 


Filters

  • Affect rendering of specific items

A screenshot of a code chunk

Formats

  • Add entirely custom new formats
---
title: "Cool Company 2022 Presentation"
format: coolco-revealjs
---

Quarto Publish

quarto publish --help

  Usage:   quarto publish [provider] [path]
  Version: 1.2.269                          
                                           
  Description:
    Publish a document or project. Available providers include:
                                                               
     - Quarto Pub (quarto-pub)                                 
     - GitHub Pages (gh-pages)                                 
     - Posit Connect (connect)                               
     - Netlify (netlify)                                       

Screenshot of the quartopub.com website

Quarto, crafted with love and care

Development of Quarto is sponsored by Posit, PBC (formerly known as RStudio, PBC). The same core team works on both Quarto and R Markdown:

Here is the full contributors list. Quarto is open source and we welcome contributions in our github repository as well! https://github.com/quarto-dev/quarto-cli.

Quarto

  • Batteries included, shared syntax across output types and languages
  • Single source publishing across document types, with raw customization allowed
  • Choose your own editor for plain text .qmd or Jupyter notebooks
  • Quarto projects + freeze for managing stored computation

Follow @quarto_pub or me @thomas_mock on Twitter/Mastodon to stay up to date!


Web resources

Quarto resources

General Quarto

Why the name “Quarto”?1